Scaling Down Text Encoders of Text-to-Image Diffusion Models